Natural Language Identification using Corpus-Based Models
نویسندگان
چکیده
منابع مشابه
Spoken language identification using the speechdat corpus
Current language identification systems vary significantly in their complexity. The systems that use higher level linguistic information have the best performance. Nevertheless, that information is hard to collect for each new language. The system presented in this paper is easily extendable to new languages because it uses very little linguistic information. In fact, the presented system needs...
متن کاملUnsupervised Natural Language Processing Using Graph Models
In the past, NLP has always been based on the explicit or implicit use of linguistic knowledge. In classical computer linguistic applications explicit rule based approaches prevail, while machine learning algorithms use implicit knowledge for generating linguistic knowledge. The question behind this work is: how far can we go in NLP without assuming explicit or implicit linguistic knowledge? Ho...
متن کاملWeb Text Corpus for Natural Language Processing
Web text has been successfully used as training data for many NLP applications. While most previous work accesses web text through search engine hit counts, we created a Web Corpus by downloading web pages to create a topic-diverse collection of 10 billion words of English. We show that for context-sensitive spelling correction the Web Corpus results are better than using a search engine. For t...
متن کاملCorpus Design For Biomedical Natural Language Processing
This paper classifies six publicly available biomedical corpora according to various corpus design features and characteristics. We then present usage data for the six corpora. We show that corpora that are carefully annotated with respect to structural and linguistic characteristics and that are distributed in standard formats are more widely used than corpora that are not. These findings have...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: HERMES - Journal of Language and Communication in Business
سال: 2017
ISSN: 1903-1785,0904-1699
DOI: 10.7146/hjlcb.v7i13.25083